139 research outputs found
A Generalization of the Convex Kakeya Problem
Given a set of line segments in the plane, not necessarily finite, what is a
convex region of smallest area that contains a translate of each input segment?
This question can be seen as a generalization of Kakeya's problem of finding a
convex region of smallest area such that a needle can be rotated through 360
degrees within this region. We show that there is always an optimal region that
is a triangle, and we give an optimal \Theta(n log n)-time algorithm to compute
such a triangle for a given set of n segments. We also show that, if the goal
is to minimize the perimeter of the region instead of its area, then placing
the segments with their midpoint at the origin and taking their convex hull
results in an optimal solution. Finally, we show that for any compact convex
figure G, the smallest enclosing disk of G is a smallest-perimeter region
containing a translate of every rotated copy of G.Comment: 14 pages, 9 figure
Minimum message length inference of secondary structure from protein coordinate data
Motivation: Secondary structure underpins the folding pattern and architecture of most proteins. Accurate assignment of the secondary structure elements is therefore an important problem. Although many approximate solutions of the secondary structure assignment problem exist, the statement of the problem has resisted a consistent and mathematically rigorous definition. A variety of comparative studies have highlighted major disagreements in the way the available methods define and assign secondary structure to coordinate data
Towards Reliable Automatic Protein Structure Alignment
A variety of methods have been proposed for structure similarity calculation,
which are called structure alignment or superposition. One major shortcoming in
current structure alignment algorithms is in their inherent design, which is
based on local structure similarity. In this work, we propose a method to
incorporate global information in obtaining optimal alignments and
superpositions. Our method, when applied to optimizing the TM-score and the GDT
score, produces significantly better results than current state-of-the-art
protein structure alignment tools. Specifically, if the highest TM-score found
by TMalign is lower than (0.6) and the highest TM-score found by one of the
tested methods is higher than (0.5), there is a probability of (42%) that
TMalign failed to find TM-scores higher than (0.5), while the same probability
is reduced to (2%) if our method is used. This could significantly improve the
accuracy of fold detection if the cutoff TM-score of (0.5) is used.
In addition, existing structure alignment algorithms focus on structure
similarity alone and simply ignore other important similarities, such as
sequence similarity. Our approach has the capacity to incorporate multiple
similarities into the scoring function. Results show that sequence similarity
aids in finding high quality protein structure alignments that are more
consistent with eye-examined alignments in HOMSTRAD. Even when structure
similarity itself fails to find alignments with any consistency with
eye-examined alignments, our method remains capable of finding alignments
highly similar to, or even identical to, eye-examined alignments.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
Tableau-based protein substructure search using quadratic programming
<p>Abstract</p> <p>Background</p> <p>Searching for proteins that contain similar substructures is an important task in structural biology. The exact solution of most formulations of this problem, including a recently published method based on tableaux, is too slow for practical use in scanning a large database.</p> <p>Results</p> <p>We developed an improved method for detecting substructural similarities in proteins using tableaux. Tableaux are compared efficiently by solving the quadratic program (QP) corresponding to the quadratic integer program (QIP) formulation of the extraction of maximally-similar tableaux. We compare the accuracy of the method in classifying protein folds with some existing techniques.</p> <p>Conclusion</p> <p>We find that including constraints based on the separation of secondary structure elements increases the accuracy of protein structure search using maximally-similar subtableau extraction, to a level where it has comparable or superior accuracy to existing techniques. We demonstrate that our implementation is able to search a structural database in a matter of hours on a standard PC.</p
VIPERdb2: an enhanced and web API enabled relational database for structural virology
VIPERdb (http://viperdb.scripps.edu) is a relational database and a web portal for icosahedral virus capsid structures. Our aim is to provide a comprehensive resource specific to the needs of the virology community, with an emphasis on the description and comparison of derived data from structural and computational analyses of the virus capsids. In the current release, VIPERdb2, we implemented a useful and novel method to represent capsid protein residues in the icosahedral asymmetric unit (IAU) using azimuthal polar orthographic projections, otherwise known as Φ–Ψ (Phi–Psi) diagrams. In conjunction with a new Application Programming Interface (API), these diagrams can be used as a dynamic interface to the database to map residues (categorized as surface, interface and core residues) and identify family wide conserved residues including hotspots at the interfaces. Additionally, we enhanced the interactivity with the database by interfacing with web-based tools. In particular, the applications Jmol and STRAP were implemented to visualize and interact with the virus molecular structures and provide sequence–structure alignment capabilities. Together with extended curation practices that maintain data uniformity, a relational database implementation based on a schema for macromolecular structures and the APIs provided will greatly enhance the ability to do structural bioinformatics analysis of virus capsids
The Use of Experimental Structures to Model Protein Dynamics
The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high—for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods—Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them
An efficient RANSAC hypothesis evaluation using sufficient statistics for RGB-D pose estimation
Achieving autonomous flight in GPS-denied environments begins with pose estimation in three-dimensional space, and this is much more challenging in an MAV in a swarm robotic system due to limited computational resources. In vision-based pose estimation, outlier detection is the most time-consuming step. This usually involves a RANSAC procedure using the reprojection-error method for hypothesis evaluation. Realignment-based hypothesis evaluation method is observed to be more accurate, but the considerably slower speed makes it unsuitable for robots with limited resources. We use sufficient statistics of least-squares minimisation to speed up this process. The additive nature of these sufficient statistics makes it possible to compute pose estimates in each evaluation by reusing previously computed statistics. Thus estimates need not be calculated from scratch each time. The proposed method is tested on standard RANSAC, Preemptive RANSAC and R-RANSAC using benchmark datasets. The results show that the use of sufficient statistics speeds up the outlier detection process with realignment hypothesis evaluation for all RANSAC variants, achieving an execution speed of up to 6.72 times
- …